Fix OTTERS lassosum selector parity by hsun3163 · Pull Request #482 · StatFunGen/pecotmr

hsun3163 · 2026-04-23T19:58:04Z

Summary

This PR fixes the OTTERS lassosum regression by replacing the default OTTERS selector with the LD-quadratic pseudovalidation score:

score(beta) = (c^T beta) / sqrt(beta^T R beta)

where:

c is the aligned summary-statistics correlation vector
R is the supplied LD correlation matrix
beta is one candidate on the lassosum (s, lambda) path

This also removes the earlier genotype-format-specific selector patch. min(fbeta) is kept only as an explicit debug option.

Root Cause

Old OTTERS did not select lassosum models by min(fbeta). It fit the beta path and then used lassosum pseudovalidation to choose the final (s, lambda).

The refactor changed that selector to min(fbeta), and the OTTERS wrapper also double-scaled the lassosum input before it reached the low-level solver.

Fixture 206 isolates the selector bug cleanly:

old saved vs old direct published lassosum: Pearson 1.0, 0 opposite-sign variants
corrected-scaling + min(fbeta): Pearson about 0.360, 1309 opposite-sign variants

Published lassosum selected s = 0.2, lambda = 1e-4, while min(fbeta) selected s = 1, lambda = 1e-4 on the same grid. This is not a grid-definition problem. It is a selector
regression.

Mathematical Rationale

Old pseudovalidation can be written as:

scaled_beta = beta / sd
pred = X * scaled_beta
score = (c^T beta) / sqrt(Var(pred))

After centering and standardizing the reference matrix columns by the same per-variant scale, this becomes:

score(beta) = (c^T beta) / sqrt(beta^T R beta)

So the selector can be evaluated directly from summary-statistics correlation and LD, without using genotype explicitly.

Validation

PLINK1 source: genotype matrix vs LD-quadratic

The LD-quadratic score matches PLINK1 genotype pseudovalidation essentially exactly.

Fixture 161:
- PLINK1 genotype best: soft_lambda=0.041050213
- PLINK1 LD-quadratic best: soft_lambda=0.041050213
- Pearson 0.9999999
- same best candidate TRUE
Fixture 206:
- PLINK1 genotype best: soft_lambda=0.029906976
- PLINK1 LD-quadratic best: soft_lambda=0.029906976
- Pearson 1.0000000
- same best candidate TRUE

This validates the selector formula itself.

Sketch source: sample matrix vs LD-quadratic

For the sketch source, the sample-matrix pseudovalidation and the LD-quadratic score are the same numeric object once both are built from the same restored sketch matrix and the
same column standardization.

Fixture 161:
- sketch sample-matrix best: soft_lambda=0.021788613
- sketch LD-quadratic best: soft_lambda=0.021788613
- Pearson 1.0
- max absolute difference < 1e-15
- same best candidate TRUE

So the remaining mismatch is not between sample-matrix pseudovalidation and quadratic LD scoring. It is between the current sketch-derived standardized LD path and the PLINK1/
genotype-backed standardized LD path.

What This PR Changes

R/regularized_regression.R

fixes the OTTERS lassosum scaling contract so correlation input is only converted once before the low-level solver
makes lassosum_rss_weights() use ld_quadratic by default
keeps min(fbeta) only as an explicit debug option
preserves first-max tie behavior for equal selector scores

R/otters.R

passes correlation-scale statistics into lassosum explicitly via stat$cor and stat$z
removes the temporary genotype-source and variant-metadata plumbing that was only needed for the earlier compatibility patch

gaow closed this Apr 24, 2026

gaow reopened this Apr 24, 2026

Hao Sun added 4 commits April 24, 2026 13:50

Fix OTTERS lassosum selector parity

59f7e05

Update lassosum PR note with LD-only selector rationale

5aeaaa5

Use LD-quadratic lassosum selection in OTTERS

e940950

Clarify source-matched lassosum validation note

e01db04

danielnachun force-pushed the fix/otters-lassosum-selector branch from 9f378b1 to e01db04 Compare April 24, 2026 20:53

Update documentation

9e93ee9

danielnachun merged commit 20862ff into StatFunGen:main Apr 24, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix OTTERS lassosum selector parity#482

Fix OTTERS lassosum selector parity#482
danielnachun merged 5 commits intoStatFunGen:mainfrom
hsun3163:fix/otters-lassosum-selector

hsun3163 commented Apr 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hsun3163 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Mathematical Rationale

Validation

PLINK1 source: genotype matrix vs LD-quadratic

Sketch source: sample matrix vs LD-quadratic

What This PR Changes

R/regularized_regression.R

R/otters.R

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hsun3163 commented Apr 23, 2026 •

edited

Loading